Goto

Collaborating Authors

 Prostate Cancer


Supplementary Materials - VIME: Extending the Success of Self-and Semi-supervised Learning to Tabular Domain

Neural Information Processing Systems

Figure 2: The proposed self-and semi-supervised learning frameworks on exemplary tabular data. Figure 3: The proposed data corruption procedure. All three datasets are given as with separate training and testing sets. The entire US prostate cancer dataset is used as the unlabeled data. For continuous variables, we report the median value with 25% and 75% percentiles.


Prostate-VarBench: A Benchmark with Interpretable TabNet Framework for Prostate Cancer Variant Classification

Tavara, Abraham Francisco Arellano, Kumar, Umesh, Pradeepkumar, Jathurshan, Sun, Jimeng

arXiv.org Artificial Intelligence

Variants of Uncertain Significance (VUS) limit the clinical utility of prostate cancer genomics by delaying diagnosis and therapy when evidence for pathogenicity or benignity is incomplete. Progress is further limited by inconsistent annotations across sources and the absence of a prostate-specific benchmark for fair comparison. We introduce Prostate-VarBench, a curated pipeline for creating prostate-specific benchmarks that integrates COSMIC (somatic cancer mutations), ClinVar (expert-curated clinical variants), and TCGA-PRAD (prostate tumor genomics from The Cancer Genome Atlas) into a harmonized dataset of 193,278 variants supporting patient- or gene-aware splits to prevent data leakage. To ensure data integrity, we corrected a Variant Effect Predictor (VEP) issue that merged multiple transcript records, introducing ambiguity in clinical significance fields. We then standardized 56 interpretable features across eight clinically relevant tiers, including population frequency, variant type, and clinical context. AlphaMissense pathogenicity scores were incorporated to enhance missense variant classification and reduce VUS uncertainty. Building on this resource, we trained an interpretable TabNet model to classify variant pathogenicity, whose step-wise sparse masks provide per-case rationales consistent with molecular tumor board review practices. On the held-out test set, the model achieved 89.9% accuracy with balanced class metrics, and the VEP correction yields an 6.5% absolute reduction in VUS.


AI-driven software for automated quantification of skeletal metastases and treatment response evaluation using Whole-Body Diffusion-Weighted MRI (WB-DWI) in Advanced Prostate Cancer

Candito, Antonio, Blackledge, Matthew D, Holbrey, Richard, Porta, Nuria, Ribeiro, Ana, Zugni, Fabio, D'Erme, Luca, Castagnoli, Francesca, Dragan, Alina, Donners, Ricardo, Messiou, Christina, Tunariu, Nina, Koh, Dow-Mu

arXiv.org Artificial Intelligence

Quantitative assessment of treatment response in Advanced Prostate Cancer (APC) with bone metastases remains an unmet clinical need. Whole-Body Diffusion-Weighted MRI (WB-DWI) provides two response biomarkers: Total Diffusion Volume (TDV) and global Apparent Diffusion Coefficient (gADC). However, tracking post-treatment changes of TDV and gADC from manually delineated lesions is cumbersome and increases inter-reader variability. We developed a software to automate this process. Core technologies include: (i) a weakly-supervised Residual U-Net model generating a skeleton probability map to isolate bone; (ii) a statistical framework for WB-DWI intensity normalisation, obtaining a signal-normalised b=900s/mm^2 (b900) image; and (iii) a shallow convolutional neural network that processes outputs from (i) and (ii) to generate a mask of suspected bone lesions, characterised by higher b900 signal intensity due to restricted water diffusion. This mask is applied to the gADC map to extract TDV and gADC statistics. We tested the tool using expert-defined metastatic bone disease delineations on 66 datasets, assessed repeatability of imaging biomarkers (N=10), and compared software-based response assessment with a construct reference standard (N=118). Average dice score between manual and automated delineations was 0.6 for lesions within pelvis and spine, with an average surface distance of 2mm. Relative differences for log-transformed TDV (log-TDV) and median gADC were 8.8% and 5%, respectively. Repeatability analysis showed coefficients of variation of 4.6% for log-TDV and 3.5% for median gADC, with intraclass correlation coefficients of 0.94 or higher. The software achieved 80.5% accuracy, 84.3% sensitivity, and 85.7% specificity in assessing response to treatment. Average computation time was 90s per scan.


NHS to offer same-day prostate cancer diagnosis

BBC News

Men with suspected prostate cancer will be able to get a diagnosis from the NHS within a day, under a new trial hailed as a potential game changer for identifying and treating the disease. The 15 hospitals taking part will use AI technology to interpret MRI scans and spot areas of abnormal tissue within minutes, according to NHS England. Scans showing a high-cancer risk will be triaged as priority review for a radiologist and patients will be booked for a same-day biopsy. Around one in eight men will develop prostate cancer in their lives, according to Prostate Cancer UK, with research showing it has overtaken breast cancer as the most commonly diagnosed form of the disease in the UK. But unlike breast cancer, there is currently no national screening programme for prostate cancer.


Context-aware deep learning using individualized prior information reduces false positives in disease risk prediction and longitudinal health assessment

Umapathy, Lavanya, Johnson, Patricia M, Dutt, Tarun, Tong, Angela, Nayan, Madhur, Chandarana, Hersh, Sodickson, Daniel K

arXiv.org Artificial Intelligence

Temporal context in medicine is valuable in assessing key changes in patient health over time. We developed a machine learning framework to integrate diverse context from prior visits to improve health monitoring, especially when prior visits are limited and their frequency is variable. Our model first estimates initial risk of disease using medical data from the most recent patient visit, then refines this assessment using information digested from previously collected imaging and/or clinical biomarkers. We applied our framework to prostate cancer (PCa) risk prediction using data from a large population (28,342 patients, 39,013 magnetic resonance imaging scans, 68,931 blood tests) collected over nearly a decade. For predictions of the risk of clinically significant PCa at the time of the visit, integrating prior context directly converted false positives to true negatives, increasing overall specificity while preserving high sensitivity. False positive rates were reduced progressively from 51% to 33% when integrating information from up to three prior imaging examinations, as compared to using data from a single visit, and were further reduced to 24% when also including additional context from prior clinical data. For predicting the risk of PCa within five years of the visit, incorporating prior context reduced false positive rates still further (64% to 9%). Our findings show that information collected over time provides relevant context to enhance the specificity of medical risk prediction. For a wide range of progressive conditions, sufficient reduction of false positive rates using context could offer a pathway to expand longitudinal health monitoring programs to large populations with comparatively low baseline risk of disease, leading to earlier detection and improved health outcomes.


Finding Holes: Pathologist Level Performance Using AI for Cribriform Morphology Detection in Prostate Cancer

Szolnoky, Kelvin, Blilie, Anders, Mulliqi, Nita, Tsuzuki, Toyonori, Samaratunga, Hemamali, Titus, Matteo, Ji, Xiaoyi, Boman, Sol Erika, Gudlaugsson, Einar, Kjosavik, Svein Reidar, Asenjo, José, Gambacorta, Marcello, Libretti, Paolo, Braun, Marcin, Kordek, Radisław, Łowicki, Roman, Delahunt, Brett, Iczkowski, Kenneth A., van der Kwast, Theo, van Leenders, Geert J. L. H., Leite, Katia R. M., Pan, Chin-Chen, Janssen, Emiel Adrianus Maria, Eklund, Martin, Egevad, Lars, Kartasalo, Kimmo

arXiv.org Artificial Intelligence

Background: Cribriform morphology in prostate cancer is a histological feature that indicates poor prognosis and contraindicates active surveillance. However, it remains underreported and subject to significant interobserver variability amongst pathologists. We aimed to develop and validate an AI-based system to improve cribriform pattern detection. Methods: We created a deep learning model using an EfficientNetV2-S encoder with multiple instance learning for end-to-end whole-slide classification. The model was trained on 640 digitised prostate core needle biopsies from 430 patients, collected across three cohorts. It was validated internally (261 slides from 171 patients) and externally (266 slides, 104 patients from three independent cohorts). Internal validation cohorts included laboratories or scanners from the development set, while external cohorts used completely independent instruments and laboratories. Annotations were provided by three expert uropathologists with known high concordance. Additionally, we conducted an inter-rater analysis and compared the model's performance against nine expert uropathologists on 88 slides from the internal validation cohort. Results: The model showed strong internal validation performance (AUC: 0.97, 95% CI: 0.95-0.99; Cohen's kappa: 0.81, 95% CI: 0.72-0.89) and robust external validation (AUC: 0.90, 95% CI: 0.86-0.93; Cohen's kappa: 0.55, 95% CI: 0.45-0.64). In our inter-rater analysis, the model achieved the highest average agreement (Cohen's kappa: 0.66, 95% CI: 0.57-0.74), outperforming all nine pathologists whose Cohen's kappas ranged from 0.35 to 0.62. Conclusion: Our AI model demonstrates pathologist-level performance for cribriform morphology detection in prostate cancer. This approach could enhance diagnostic reliability, standardise reporting, and improve treatment decisions for prostate cancer patients.


Generalisation of automatic tumour segmentation in histopathological whole-slide images across multiple cancer types

Skrede, Ole-Johan, Pradhan, Manohar, Isaksen, Maria Xepapadakis, Hveem, Tarjei Sveinsgjerd, Vlatkovic, Ljiljana, Nesbakken, Arild, Lindemann, Kristina, Kristensen, Gunnar B, Kasius, Jenneke, Zeimet, Alain G, Brustugun, Odd Terje, Busund, Lill-Tove Rasmussen, Richardsen, Elin H, Haug, Erik Skaaheim, Brennhovd, Bjørn, Rewcastle, Emma, Lillesand, Melinda, Kvikstad, Vebjørn, Janssen, Emiel, Kerr, David J, Liestøl, Knut, Albregtsen, Fritz, Kleppe, Andreas

arXiv.org Artificial Intelligence

Deep learning is expected to aid pathologists by automating tasks such as tumour segmentation. We aimed to develop one universal tumour segmentation model for histopathological images and examine its performance in different cancer types. The model was developed using over 20 000 whole-slide images from over 4 000 patients with colorectal, endometrial, lung, or prostate carcinoma. Performance was validated in pre-planned analyses on external cohorts with over 3 000 patients across six cancer types. Exploratory analyses included over 1 500 additional patients from The Cancer Genome Atlas. Average Dice coefficient was over 80% in all validation cohorts with en bloc resection specimens and in The Cancer Genome Atlas cohorts. No loss of performance was observed when comparing the universal model with models specialised on single cancer types. In conclusion, extensive and rigorous evaluations demonstrate that generic tumour segmentation by a single model is possible across cancer types, patient populations, sample preparations, and slide scanners.


Conservative Decisions with Risk Scores

Wei, Yishu, Lee, Wen-Yee, Quaye, George Ekow, Su, Xiaogang

arXiv.org Machine Learning

In binary classification applications, conservative decision-making that allows for abstention can be advantageous. To this end, we introduce a novel approach that determines the optimal cutoff interval for risk scores, which can be directly available or derived from fitted models. Within this interval, the algorithm refrains from making decisions, while outside the interval, classification accuracy is maximized. Our approach is inspired by support vector machines (SVM), but differs in that it minimizes the classification margin rather than maximizing it. We provide the theoretical optimal solution to this problem, which holds important practical implications. Our proposed method not only supports conservative decision-making but also inherently results in a risk-coverage curve. Together with the area under the curve (AUC), this curve can serve as a comprehensive performance metric for evaluating and comparing classifiers, akin to the receiver operating characteristic (ROC) curve. To investigate and illustrate our approach, we conduct both simulation studies and a real-world case study in the context of diagnosing prostate cancer.


Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI

Ramos, Baltasar, Garrido, Cristian, Narv'aez, Paulette, Claro, Santiago Gelerstein, Li, Haotian, Salvador, Rafael, V'asquez-Venegas, Constanza, Gallegos, Iv'an, Zhang, Yi, Casta~neda, V'ictor, Acevedo, Cristian, Wu, Dan, C'ardenas, Gonzalo, Sotomayor, Camilo G.

arXiv.org Artificial Intelligence

Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.


Smart Trial: Evaluating the Use of Large Language Models for Recruiting Clinical Trial Participants via Social Media

Zhou, Xiaofan, Wang, Zisu, Krieger, Janice, Zalake, Mohan, Cheng, Lu

arXiv.org Artificial Intelligence

Clinical trials (CT) are essential for advancing medical research and treatment, yet efficiently recruiting eligible participants -- each of whom must meet complex eligibility criteria -- remains a significant challenge. Traditional recruitment approaches, such as advertisements or electronic health record screening within hospitals, are often time-consuming and geographically constrained. This work addresses the recruitment challenge by leveraging the vast amount of health-related information individuals share on social media platforms. With the emergence of powerful large language models (LLMs) capable of sophisticated text understanding, we pose the central research question: Can LLM-driven tools facilitate CT recruitment by identifying potential participants through their engagement on social media? To investigate this question, we introduce TRIALQA, a novel dataset comprising two social media collections from the subreddits on colon cancer and prostate cancer. Using eligibility criteria from public real-world CTs, experienced annotators are hired to annotate TRIALQA to indicate (1) whether a social media user meets a given eligibility criterion and (2) the user's stated reasons for interest in participating in CT. We benchmark seven widely used LLMs on these two prediction tasks, employing six distinct training and inference strategies. Our extensive experiments reveal that, while LLMs show considerable promise, they still face challenges in performing the complex, multi-hop reasoning needed to accurately assess eligibility criteria.